Search CORE

5 research outputs found

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Author: Bovik Alan C.
Norkin Andrey
Paul Somdyuti
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64

\times

64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate

arXiv.org e-Print Archive

Crossref

Recommended from our members

Deep learning solutions for video encoding and streaming

Author: Paul Somdyuti
Publication venue
Publication date: 21/09/2022
Field of study

Video data has emerged as the top contributor to the global internet traffic, and video compression is the key technology that enables its efficient storage, transmission and retrieval. As the video compression technology advances to keep pace with the proliferation of video data, state of the art video codecs that rely on block based hybrid coding tend to become increasingly complex and computationally intensive. Moreover, currently, it appears challenging to significantly improve video compression efficiency by solely relying on traditional approaches. Consequently, deep learning techniques are being extensively explored in the context of designing video compression technologies. My research addresses the problem of making the benefits of data driven deep learning accessible to some key areas of video coding and compression based video streaming technologies. First, this dissertation introduces the deep learning framework to speed up intra mode encoding in the VP9 video codec. In VP9 , the sizes of blocks are decided by a computationally intensive rate-distortion optimization (RDO) process, that evaluates the combinatorially complex search space of possible partitions of 64 × 64 superblocks. We devised a learning based alternative framework to predict the intra-mode superblock partitions using a hierarchical fully convolutional network (H-FCN), that was experimentally shown to speed up the intra-mode encoding of the reference VP9 encoder. Subsequently, our work on deep learning based block motion estimation is expounded. Block based motion estimation is essential for performing inter-prediction in hybrid codecs, a mechanism which is responsible for bulk of the compression capability achieved by it. However, prevalent block matching based procedures that are used to compute block motion vectors (MVs) are computationally intensive, are prone to detecting spurious motions which worsen at smaller block sizes, and are agnostic to the perceptual quality of the predicted frames. To address these issues, we developed a composite block translation network (CBT-Net) that jointly predicts the MVs of blocks having multiple sizes by using the MVs predicted for larger blocks to guide the motion estimation of smaller blocks. Our framework produces more coherent motion fields at smaller block sizes as compared to traditional block matching based MV estimation, and is also computationally efficient. Its rate-distortion performance gains are demonstrated for AV1 encoding. The last part of this dissertation focuses on learning based approaches in the context of designing compression based adaptive video streaming. Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we employed a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. The proposed RCN-Hull model substantially reduced the pre-encoding time needed for convex hull generation while closely approximating the optimal convex hulls. The competitive advantage of our method over existing ones based on heuristics or feature based machine learning was also demonstrated. The different deep learning frameworks that we introduced in this dissertation thus attest to the compelling advantages offered by deep learning based tools and techniques in driving the development and deployment of future video coding and streaming technologies.Electrical and Computer Engineerin

Texas ScholarWorks

Image Statistic Models Characterize Well Log Image Quality

Author: Alan C. Bovik
Somdyuti Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Author: Bovik Alan C.
Norkin Andrey
Paul Somdyuti
Publication venue
Publication date: 10/06/2022
Field of study

Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 58.0% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.08%, while the mean absolute deviation of the BD-rate distribution was 0.44

arXiv.org e-Print Archive

Selection of keyframes for video colourization using steerable filtering

Author: A Levin
B Sheng
CW Ngo
G Guan
P Usach-Molina
Saumik Bhattacharya
Somdyuti Paul
Sumana Gupta
WT Freeman
Z Cernekova
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref